Comparing Constituency and Dependency Representations for SMT Phrase Extraction
نویسندگان
چکیده
Mary Hearne, Sylwia Ozdowska and John Tinsley National Centre for Language Technology, Dublin City University, Glasnevin, Dublin 9, Ireland {mhearne,sozdowska,jtinsley}@computing.dcu.ie Résumé. Nous évaluons le recours à des techniques de traduction à base de segments syntaxiquement motivés, seules ou en combinaison avec des techniques à base de segments non motivés, et nous comparons les apports respectifs de l’analyse en constituants et de l’analyse en dépendances dans ce cadre. À partir d’un corpus parallèle Anglais–Français, nous construisons automatiquement deux corpus d’entraînement arborés, en constituants et en dépendances, alignés au niveau sous-phrastique et en extrayons des correspondances bilingues entre mots et syntagmes motivées syntaxiquement. Nous mesurons automatiquement la qualité de la traduction obtenue par un système à base de segments. Les résultats montrent que la combinaison des correspondances bilingues non motivées et motivées sur le plan syntaxique améliore la qualité de la traduction quel que soit le type d’analyse considéré. Par ailleurs, le gain en qualité est plus important avec le recours à l’analyse en dépendances au regard des constituants.
منابع مشابه
Using Percolated Dependencies for Phrase Extraction in SMT
Statistical Machine Translation (SMT) systems rely heavily on the quality of the phrase pairs induced from large amounts of training data. Apart from the widely used method of heuristic learning of n-gram phrase translations from word alignments, there are numerous methods for extracting these phrase pairs. One such class of approaches uses translation information encoded in parallel treebanks ...
متن کاملConverting Phrase Structures to Dependency Structures in Sanskrit
Two annotations schemes for presenting the parsed structures are prevalent viz. the constituency structure and the dependency structure. While the constituency trees mark the relations due to positions, the dependency relations mark the semantic dependencies. Free word order languages like Sanskrit pose more problems for constituency parses since the elements within a phrase are dislocated. In ...
متن کاملTree Representations for Chinese Semantic Role Labeling
We compare different parse tree representations for the task of Chinese Semantic Role Labeling (SRL), including dependency and constituency parse trees, two tree pruning methods, and neighbor features. Three learning models are compared. By using SVM classifier with neighbor features and pruning tree to phrase level we achieve significantly better speed and accuracy than state of the art Chines...
متن کاملTransition-Based Natural Language Parsing with Dependency and Constituency Representations
Hall, Johan, 2008. Transition-Based Natural Language Parsing with Dependency and Constituency Representations, Acta Wexionensia No 152/2008. ISSN: 1404-4307, ISBN: 978-91-7636-625-7. Written in English. This thesis investigates different aspects of transition-based syntactic parsing of natural language text, where we view syntactic parsing as the process of mapping sentences in unrestricted tex...
متن کاملStatistical Dependency Parsing of Four Treebanks
Multilingual dependency parsing is gaining popularity in recent years for several reasons. Dependency structures are more adequate for languages with freer word order than the traditional constituency notion. There is a growing availability of dependency treebanks for new languages. Broad coverage statistical dependency parsers are available and easily portable to new languages. Dependency pars...
متن کامل